What made you happy today?

HappyDB is a corpus of 100,000 crowd-sourced happy moments via Amazon’s Mechanical Turk. You can read more about it on https://arxiv.org/abs/1801.07746

In this project, we apply text mining and natural language processing techniques to explore what drives us happy from various aspects.

Part 1: Text Processing

In this part, we perform text processing by cleanning text, stemming words and creating tidy object.

Part 2: Elementary Analysis

Which is the most frequent topic people mention in their happy moments? Is there any significant difference among group across region, age, gender and marital status? In this part, we will dig deeper into the datasets.

First, let us see what is the main reason to make people happy among 100,000 moments.

We can see there are a few words appearing frequently in happy moments: “friend”, “family”, “time”, “home”, “dinner”, “job”, “dog”, and so on. No surprisingly, family, friends and jobs are what we treasure most and hence they are the source of our happiness.

However, do different groups display same keywords? Let us move to next stage.

Firstly, let us see the distribution of different groups. gender

marital status country

age From the demographic distribution plot, we can see gender and marital status are generally equally distributed while country and age are highly imbanlanced.

We want to know if there are differences between male vs female, single VS married, USA VS India, young VS old in terms of their happy moments.

male vs female

From this plot, we see that for both male and female, they value friend and family most. However, there are still some differences in what contribute to their happiness. There is a large proportion of happy moments are from games for male while female people have many good memories related to birthday.

single vs married

Same as male VS female group, both single and married group have a good time with their friends. We still find some interesting things. Since married people are not alone, their happiness are also largely from their wife, husband, son and daughter while for single people they have fun with games or other events.

USA vs India Since data from country group are highly imbanlanced, we also compare America and India which they are of same magnitude.

Not surprisingly, friends bring these two groups of people lots of happiness. But we still see some core value differences in these two countries. Happiness from son and daughter for American people are equally weighted while it is not for Indian people.

young vs old

Probably due to the reason that old people are not alone, their happiness are largely from their wife, husband, son and daughter while for young people are from games or other events.

Part 3: Sentiment Analysis

Although all 100,000 happiness moments have positive attitudes, there may be some differences in their intensity. In this part, we perform sentiment analysis across different groups.

male vs female Overall, the experiences are definitely positive. But we still have negative sentiments, which may arise from tragedy. Meanwhile, female have slightly higher sentiment scores than men for most of their experiences.

young vs old We still see negative sentiments and young people have slightly higher sentiment scores than old people for most of their experiences.